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Abstract 

In this paper we study the performance of the most popular bootstrap schemes for mul¬ 
tilevel data. Also, we propose a modified version of the wild bootstrap procedure for hierar¬ 
chical data structures. The wild bootstrap does not require homoscedasticity or assumptions 
on the distribution of the error processes. Hence, it is a valuable tool for robust inference 
in a multilevel framework. We assess the finite size performances of the schemes through a 
Monte Carlo study. The results show that for big sample sizes it always pays off to adopt 
an agnostic approach as the wild bootstrap outperforms other techniques. 

Keywords: Multilevel model; Wild bootstrap, Heteroscedasticity; Cases bootstrap. 


1 Introduction 

Multilevel data consist of units of analysis of different type which are hierarchically clustered. In 
a strictly nested data structure, the term levels represents the different types of unit of analysis, 
i.e. the various types of grouping; in particular, the most detailed level is called the first (or the 
lowest) level. 

A meaningful example of multilevel data comes from studies on educational achievement, 
in which pupils, teachers, classrooms, schools, district, and so on, are clustered one within the 
other, and they might all be units of analysis, each described by own variables. Hierarchical data 
often occur also in social sciences; economists and political scientists frequently work with data 
measured at multiple levels in which individuals are nested in geographic divisions, institutions 
or groups, and so forth. Furthermore, other particular structures of data can be thought as 
multilevel: the repeated measurements over time on an individual, the respondents to the same 
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interviewer and also subjects within a particular study among those of a meta-analysis can be 
considered groups of observations, and, consequently, be treated as multilevel data. 

The idea behind modelling multilevel data is that living environments affect individual be¬ 
haviours, and contextual effects are due to social interactions within an environment. In general, 
individuals can influence and be influenced by various type of contexts: spatial, temporal, orga¬ 
nizational and socio-economic-cultural. As Kreft and Leeuw (1998) put it, “the more individuals 
share common experiences due to closeness in space and/or in time, the more they are similar, 
or, to a certain extent, duplications of each other”; in other words, performances of pupils in the 
same classroom tend to be more similar than those from a different classroom because of sharing 
contexts. This is one of the reasons why the specificity of multilevel data cannot be ignored, be¬ 
cause the observations within one group are not independent of each other, as traditional models 
require. If standard statistical analyses, which generally assume independent observations, are 
performed on multilevel data (the so-called naive pooling strategy), results may be misleading. 

Inference in multilevel models usually relies upon maximum likelihood methods (see for exam¬ 
ple Skrondal and Rabe-Hesketh (2004), Raudenbush and Bryk (2002) and Searle et al. (1992)) 
that mostly use asymptotic approximations for the construction of test statistics and esti¬ 
mation of variances. If the sample size is not large enough, the asymptotic approximation 
does not hold and can lead to incorrect inferences. By using bootstrap methods, under some 
regularity conditions, it is possible to obtain a more accurate approximation of the distribu¬ 
tion of the statistics. The original bootstrap procedure has been studied in detail by Efron 
(1979) for independent and identically distributed (i.i.d.) observations. An extensive discus¬ 
sion of bootstrap methods for a variety of statistical models and for different data structures 
can be found in Davison and Binkley (1997), and in particular for the multilevel structure in 
Van der Leeden et al. (2008) and Goldstein (2010). In the case of hierarchical data three general 
bootstrap approaches are available and well established; the parametric bootstrap, the residual 
bootstrap and the cases bootstrap. 

The aim of the paper is twofold: first, we review and test the finite size performance of the 
three bootstrap schemes for multilevel data by means of a simulation study; second, we propose a 
wild bootstrap procedure for multilevel data. The wild bootstrap does not assume homoscedas- 
ticity and, for this reason, can reveal appropriate for inference robust to heteroscedasticity of 
unknown form. The paper is organized as follows. Section 2 presents the three bootstrap 
schemes for multilevel models. In Section 3 we introduce the wild bootstrap procedure for the 
linear regression model and extend it to the case of hierarchical data. Section 4 presents a Monte 
Carlo study that compares all the methods under different scenarios. Finally, Section 5 contains 
the conclusions. 
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2 Bootstrap procedures for the multilevel model 


Resampling schemes for multilevel models have to take into account the hierarchical structure 
of data. Hence, the classic bootstrap procedures need to be adapted. The main bootstrap 
approaches for a multilevel model are discussed for example in Van der Leeden et al. (2008) and 
Goldstein (2010)): 

1. the parametric bootstrap (resamples from the fitted distribution of the error processes) 

2. the residual bootstrap (resamples residuals from the fitted model) 

3. the cases bootstrap (resamples entire cases) 

The schemes differ in the underlying assumptions. We illustrate them by considering the follow¬ 
ing two-level model that includes k level-1 covariates with fixed coefficients (xjj) and p covariates 
with random coefficients (zjj) 


yij = xJjP + zJjUj + eij, for i = l,...,nj and for j = l,...,J; 


( 1 ) 


with eij ~ 


NID(0,n; 




7'^ = 


1 


Zlj 




and Uj = 


UQj U\j 




NID(0,S), 


where the variance-covariance matrix of the random effects has the following form 


^lo 

<7u01 • 

■ • ^uOp 

O'ulO 


• • ^ulp 

^upO 

^upl 

2 

• • ^up 


Moreover, it is assumed that the random effects for the group j, the vectors Uj, and the within- 
group errors, eij, are independent. Finally, we denote hy 6 = {/3,(T£,11} the (restricted) maxi¬ 
mum likelihood estimates of the parameters of the model (1). 

2.1 The parametric bootstrap 

The parametric bootstrap assumes that the covariates are fixed and that both the model and the 
error distributions are correctly specified. Compared to the other two approaches, such scheme 
has the strongest requirements. In practice, the parametric bootstrap for model (1) generates 
resamples as follows: 
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1. draw N elements e*j from the estimated distribution of level-1 errors, ~ -^(0, where 
N = Ylj=i is total sample size; 

2. draw J (p-l-l)-vectors of elements u* from the estimated distribution of the random effects, 

4~iV(0,S); 

3. generate the bootstrap responses as y*^ = + zJjUj -|- e*j 

4. compute the bootstrap value 6 on the generated sample; 

5. repeat steps 1 — 4 B times as to obtain B sets of bootstrap replications of the parameters. 

As remarked above, the parametric bootstrap is not robust with respect to any deviation from 
the normality assumption on the error terms so that severe problems can occur in such cases. 

2.2 The residual bootstrap 

The residual bootstrap was introduced in the multilevel framework by Carpenter et al. (2003); 
it treats the covariates as fixed and assumes that the model specification is correct. No distribu¬ 
tional assumptions on error terms are required but only variance homogeneity among groups. In 
the classic linear regression framework, the scheme resamples with replacements from the resid¬ 
uals of the ht. The implementation of this procedure in a multilevel model leads to a distortion 
because the residuals are shrunken towards zero so that the true variability of the residuals 
is not reproduced in the resamples (Goldstein, 2010). Therefore, it is necessary to reflate the 
shrunken residuals. The procedure generates bootstrap samples as follows: 

1. compute level-2 and level-1 shrunken residuals from model (1), respectively 

h, = fzj(zf+ allN)~\yj - Xj^) 

and eij = yij - x.Jj0 - zfjUj Vi, j, 

where f = diag(S, S,..., S) is the ML estimate of the block-diagonal variance-covariance 
matrix of the whole random-effects matrix U = [ui, U 2 ,. .., uj]"^; both level-1 and level-2 
residuals must be centered, since in the multilevel model their mean is not zero; 

2. reflate the (centered) residuals, that is, consider a transformation U = UA such that 
tb^U/J = r. In other words, we need to achieve that estimates of the variance obtained 
by means of the shrunken residuals equal the maximum likelihood estimates obtained 
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from the model; otherwise, the procedure would lead to downward biased estimates of the 
variance parameters. For level-2 residuals we have U^U/J = A^U^UA/J = A'^SA = F; 
one possible choice of A is A = where L 5 and Lp are the Cholesky decompositions 

of S and F respectively. The same procedure is applied to level-1 residuals; 

3. draw with replacement J vectors from the set of reflated level-2 residuals, {uj} ; 

4. draw with replacement N elements e*j from the set of reflated level-1 residuals {eij}; 

5. generate the bootstrap responses as y*^ = + % Vi,j; 

6 . compute bootstrap values 9 on the generated sample; 

7. repeat steps 3 — 6 B times as to obtain B bootstrap replications of the parameters. 

Since, it does not rely on any distributional assumptions, the residual bootstrap is robust 
with respect to non-normality of the error processes. By means of a simulation study on a two- 
level model with Xi errors, Carpenter et al. (2003) show that the empirical coverage of bootstrap 
confidence intervals is better for the residual bootstrap than for the parametric bootstrap; also, 
they show that the most important improvements concern the estimates of random parameters. 

2.3 The cases bootstrap 

The cases bootstrap for the linear regression model was proposed in Freedman (1981) under the 
name pairs bootstrap. Of the three bootstrap procednres for multilevel model reviewed here, 
the cases bootstrap has the least restrictive assumptions; it assumes that only the hierarchical 
dependency in the data is correctly specified and considers the covariates as random variables. 
The procedure resamples entire cases as follows: 

1. draw with replacement J units from the set of numbers {1,2,... , J}; any drawn index 
j' = 1,..., J is associated to a whole level-2 unit (yj/, X^/, Z^/); 

2. for each selected level-2 nnit (yj/,Xj/, Zj/), j' = 1,..., J, draw with replacement a boot¬ 
strap sample (y*,, X*,, Z*, ) of size nj '; 

3. compute the bootstrap value 9 on the generated sample; 

4. repeat steps 1 — 3 B times as to obtain B bootstrap replications of the parameters. 

The scheme of the cases bootstrap described above resamples both level-1 and level-2 units. 
However, whether this makes sense depends on the nature of the data. There may be instances 
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in which it is correct to resample only level-1 (or level-2) units. For instance, when level-2 
units are individuals and level -1 units are repeated measures, it is appropriate to resample 
only level -2 units; then, for each level -2 unit drawn all the associated level -1 units enter the 
resample. On the contrary, if level-2 units are countries or time points and level-1 units are 
individuals, it is more appropriate to resample only level -1 units within countries (or time 
points). For further discussion on the matter see Van der Leeden et al. (2008). The cases 
bootstrap (appropriately implemented) provides consistent estimators under heteroscedasticity 
at the price of less efficient estimators than the parametric and the residual bootstrap (Flachaire, 
1999; Davison and Hinkley, 1997). 


3 The wild bootstrap 

The wild bootstrap is a technique aimed to obtain consistent estimators for the covariance matrix 
of the coefficients of a regression model when the errors are heteroscedastic. It was developed 
by Liu (1988) following suggestions in Wu (1986) and Beran (1986). Further evidences and 
refinements are provided in Flachaire (2004) and Davidson and Flachaire (2008). Consider the 
classic linear regression model yi = x7/3 -|- i/j, for i = 1,..., n. The disturbances Vi are assumed 
to be mutually independent and to have zero mean, but they are allowed to be heteroscedastic. 
Moreover, the covariates are assumed to be strictly exogenous. 

In the homoscedastic case, the variance of the residuals is proportional to 1 — hi, where hi = 
xl (X"^X)“^Xj is the i-th diagonal element of the orthogonal projection matrix H = X(X^X)~^X^ 
This suggests to replace the heteroscedastic residuals Vi with 

Vi = Vi/ a/(1 - hi) or Vi = Vi/(1 - hi) (2) 

in order to reduce the bias of the variance estimator, as we use to do in the homoscedastic 
case by using the unbiased OLS estimator. These are two of the Heteroskedasticity Consistent 
Covariance Matrix Estimator (HCCME) forms considered in MacKinnon and White (1985), 
that Flachaire (2004) refers to as HC 2 and HC 3 respectively. We omit to mention the other 
forms of HCCME since, as is shown in MacKinnon and White (1985) and Chesher and Jewitt 
(1987), HC 2 and HC 3 outperform the others in terms of power and size of the tests; instead the 
two forms in ( 2 ) cannot be ranked, although in some simulation experiments HC 3 has shown 
the least distortion (see for example Davidson and Flachaire (2008)). 

The wild bootstrap procedure tries to recover the unknown form of heteroscedasticity of the 
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errors by means of the following bootstrap data-generating process: 


y* =^10 + ViWi 


( 3 ) 


where (3 is the vector of estimated regression coefficient, Vi is one of the variants of HCCME 
(such as those in ( 2 )) where the residuals have been transformed and the Wi (for i = 1,... ,n) 
are mutually independent errors drawn from an auxiliary distribution with zero mean and unit 
variance. While Mammen (1993) suggests the following asymmetric two-point distribution for 


the Wi 


Fi : 


Wi= < 


-iV5-l)/2 

(V5 + l)/2 


with probability p = (\/5 -|- l)/(2\/5) 
with probability 1 — p, 


(4) 


Liu (1988) mentions, instead, the distribution 


F2 : 



with probability 
with probability 


0.5 

0.5. 


(5) 


Based on the evidence of the simulation study, Davidson and Flachaire (2008) recommend the 
auxiliary distribution F 2 rather than other versions. For further discussions on this choice, see 
also Liu (1988) and Belsley and Kontoghiorghes (2009). 

The wild bootstrap procedure is as follows: 

1 . draw n independent values, Wi, for i = 1 ,... ,n, from an auxiliary distribution with zero 
mean and unit variance such as Fi and F 2 ; 

2. generate the bootstrap samples according to Eq. (3) 

3. compute the bootstrap value 9 on the generated sample y*; 

4. repeat steps 1 — 3 B times as to obtain B bootstrap replications of the parameters. 

Note that also the pairs bootstrap, called cases bootstrap in multilevel models (subsection 2.3), is 
used to overcome the problem of heteroscedasticity of unknown form. However, Flachaire (2004) 
shows that the version of wild bootstrap with F 2 in (5) recommended in Davidson and Flachaire 
(2008), provides better numerical performance in terms of false rejection probability and power 
of a test than both other versions of wild bootstrap and the pairs bootstrap. Given these results 
for regression models, we expect that the wild bootstrap behaves similarly when applied to 
multilevel data. However, to our knowledge, this technique has never been implemented and 
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used in a multilevel framework. Hence, we provide a modified version of the wild bootstrap that 
could be useful in cases of hierarchical and heteroscedastic data. The next subsection is devoted 
to this matter. 

3.1 The wild bootstrap for multilevel models 

Now, we adapt the procedure presented above to the case of hierarchical data. Consider the 
classic multilevel model in Eq. (1) but for the {rij x 1) response of the generic group j: 

Yj = Xj/l + ifj, with i^j = ZjUj + ej, ( 6 ) 

for all j = 1,..., J. Note that the wild bootstrap procedure requires the disturbances to be 
mutually independent. However, in the multilevel framework the compound error terms of two 
generic units in the group j, i^ij = and t'j/j = zJ,jUj + Cj/j, are not independent by 

dehnition. Instead, the error terms corresponding to the whole generic group j, vj, are mutually 
independent. Therefore, handling the vectorial form of the multilevel model ( 6 ), rather than the 
univariate form ( 8 ), allows from one hand to put oneself in the same situation of the classic wild 
bootstrap procedure, and, from another hand, to take into account the intra-class correlation of 
hierarchical data. 

Denoting with Uj = the j-th diagonal block of the orthogonal projection matrix 

associated to the design matrix H, the HCCME expressions in (2) become 

HC 2 : Vj = diag(lnj - o vj 

HC 3 : Vj = diag(lnj - Hj) o v^, (7) 

where the vector of the residuals for the j-th group is computed as 

Vj = Yj - 

and the operator “o” denotes the Hadamard (or entry wise) product. 

We suggest to implement the wild bootstrap for a multilevel model as follows: 

1. draw j independent values, Wj, for i = j,..., J, from an auxiliary distribution with zero 
mean and unit variance such as Fi and T 2 ; 


2 . generate the bootstrap samples 


Yj = + ^jWj, 

where the transformed residuals Vj are as in (7); 

3. compute the bootstrap value 9 on the generated sample; 

4. repeat steps 1 — 3 B times as to obtain B bootstrap replications of the parameters. 


4 Monte Carlo study 

In this section we present the results of a Monte Carlo study that compares the three bootstrap 
procedures described above (Section 2) with the adapted wild bootstrap procedure for multilevel 
data (Section 3.1). Here, the wild bootstrap is implemented with Fi as in Eq. (4) as auxiliary 
distribution and HC 2 as in Eq. (7) as HCCME form. Further, we study the behaviour of the 
wild bootstrap with different choices of auxiliary distributions and HCCME forms. Consider 
the following two-level model 


Vij = /3o + uoj + (/3i + uij)xuj + eij, (8) 

for level-1 units i = 1,... ,nj and level-2 units j = 1,..., J. In this study, the clusters contain 
the same number of level-2 units, that is rij = n for all j. The true values chosen for the 
regression coefficients are /3o = 3 and /3i = 5 and the xijj’s are simulated from a standard 
normal distribution. 

In order to assess the finite size performance of the bootstrap schemes in presence of non¬ 
constant variance we generate samples with both homoscedastic and heteroscedastic level -1 
errors, €ij. To this aim, we define 

and set Sij = 1 for all i,j for generating homoscedastic data, and Sij = xuj for obtaining 
heteroscedastic data. Hence, the variance of level-I error terms is where af, = V(r'ij) 

for all i and j. Also, we control for deviations from normality by adopting two different error 
distributions. In the first case, we draw N values of Uij ~ ^"(0, = 2) and, for each j = 1,..., J, 
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we draw a sample {uQj,uij) from the bivariate normal distribution: 



UQj 


0 


<^lo = 2 

^uoi — 0*5 

Ulj 

V 

0 


o'uoi = 0.5 

^ni = 2 


As for the non Gaussian case, we draw N values of Uij a from Xi ~ 1 distribution and, for each 
j = 1,... ,J, we draw a sample {hi, / 12 ) from the bivariate normal distribution 


hi 

( 

0 


1 0.5 


~ iV 


1 


h2 

V 

0 


0.5 1 


then, we set uoj = hf — 1 and uij = h 2 — 1 so that both random effects have marginal (xf — 1)- 
distribution with mean 0, variances ct^q and equal to 2 and covariance auoi equal to 0.5. 

In the study, we vary also the group size n and the number of groups J as follows: 



n 

J 

N 

1 

10 

20 

200 

2 

10 

80 

800 

3 

20 

40 

800 

4 

40 

80 

3200 


Note that settings 1 and 2 differ in the number of groups whereas settings 2 and 4 differ 
in the group size. For each setting, we simulate 500 datasets from model (8), and for each 
dataset we compute restricted maximum likelihood estimates of the parameters. The num¬ 
ber of bootstrap replications \s B = 999, hence we obtain B replications of the parameters, 
6 = {/3q,/ 3J', (dg)*, (ct^q)*, ((T„oi)*) Finally, we derive 95% bootstrap confidence inter¬ 

vals (Davison and Hinkley, 1997; Van der Leeden et ah, 2008) by sorting the bootstrap replica¬ 
tions and taking the following two percentiles 

^100o/2>^100(1-Q!/2) 

with a = 0.05. As measures of performance we take the empirical coverage of the confidence 
interval and its average length. Table 1 summarizes the scenarios of the simulation study 
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indicating the tables of the results for each case. 


Table 1: Summary of the scenarios of the Monte Carlo study. 


Errors 

Homoscedastic 

Heteroscedastic 

Gaussian 

x?-i 

Scenario 1: Table 2 
Scenario 3: Table 4 

Scenario 2: Table 3 
Scenario 4: Table 5 


In the case of Gaussian and homoscedastic errors (scenario 1, Table 2), all the bootstrap 
procedures considered behave quite well. In particular, for regression coefficients, the residual 
bootstrap has the best performance (even though by a tiny margin) in all the four sample sizes. 
As for the estimation of cr^, the wild bootstrap always produces a coverage of 100% at the 
price of wider confidence intervals. Moreover, for level-2 parameters, the intervals based on 
the parametric bootstrap have the highest coverage when both the level-1 and level-2 sample 
sizes are low. However, as the sample sizes increases, the wild bootstrap behaves better other 
schemes for almost all parameters. When the level-1 errors are Gaussian but heteroscedastic 
(scenario 2, Table 3), the coverage of the parametric bootstrap is good for all but the within- 
group variance cr^. The same happens for the residual and the wild bootstrap but with some 
important differences. In fact, the wild bootstrap has always the highest coverage when the 
sample sizes increase. Also, as in scenario 1, the coverage for the within-group variance a‘1 is 
rather low for all the schemes but the wild bootstrap. In the case of homoscedastic errors with 

— 1 distribution (scenario 3, Table 4), the parametric bootstrap produces a small coverage 
not only for the within-group variance but also for level-2 parameters. On the other hand, the 
coverage of the residual bootstrap is still good, in line with the results in Garpenter et al. (2003). 
However, when the sample sizes increase, the wild bootstrap produces again the highest coverage 
in every instance. Lastly, when the xf — 1-errors are heteroscedastic (scenario 4, Table 5), the 
results are not clear cut as there is not a definite winner. The parametric bootstrap has low 
coverage mainly when variances are involved. The residual bootstrap has a fair performance, 
but, again, with big sample sizes the wild bootstrap assures good coverage and length for all the 
parameters. Finally, note that the cases bootstrap is always outperformed by other procedures, 
as happens in linear regression models (see among the others Davison and Hinkley (1997) and 
Flachaire (1999)). 

In Figures 1 and 2 we provide a visual comparison of the bootstrap methods in the het¬ 
eroscedastic scenario. The figures show a single instance of the distributions of the bootstrap 
replications for and auoi, respectively where the samples have heteroscedastic Gaussian er¬ 
rors. The left panels show the result for small sample size {N = 200) whereas the right panels 
refer to a big sample size [N = 3200). Figure 1 shows that, in this example, both the paramet- 
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Table 2 : Scenario 1: Gaussian and homoscedastic lev-1 errors. Coverage (%) and average length 
(in parenthesis) for 95% bootstrap confidence intervals. 



Parametric 

Residual 

Cases 

Wild 

Parametric 

Residual 

Cases 

Wild 


n = 10, 

j = 20 , 

N = 200 


n = 20, 

J = 40, 

N = 800 


/3o 

95.2 

95.2 

90.6 

94.2 

94 

94 

85 

93 


(1.32) 

(1.32) 

(1.1) 

(1.29) 

(0.89) 

(0.89) 

(0.71) 

(0.88) 

/3i 

94.4 

94.8 

90.4 

93.6 

93.6 

94.4 

87.4 

93.4 


(1.33) 

(1.32) 

(1.14) 

(1.3) 

(0.89) 

(0.89) 

(0.72) 

(0.88) 


94.4 

94.2 

72.2 

100 

94.6 

94 

72.6 

100 


(0.31) 

(0.31) 

(0.39) 

(0.7) 

(0.15) 

(0.15) 

(0.19) 

(0.46) 


94.2 

92 

94.2 

93.6 

92.8 

91.2 

90 

94.4 

(1.01) 

(0.96) 

(0.88) 

(1) 

(0.65) 

(0.63) 

(0.53) 

(0.73) 

O-uOl 

97.4 

96.6 

92.4 

89.6 

96.2 

95 

88.6 

92.8 


(2.09) 

(1.97) 

(1.96) 

(1.97) 

(1.32) 

(1.27) 

(1.12) 

(1.3) 


91.6 

90.4 

93.2 

91.4 

92.2 

91 

89 

93.6 


(1.03) 

(0.99) 

(0.95) 

(1.02) 

(0.66) 

(0.63) 

(0.54) 

(0.73) 


n = 10, 

J = 80, 

N = 800 


n = 40, 

J = 80, 

N = 3200 


00 

94.8 

94.8 

90.2 

94.6 

94.4 

94.6 

87.6 

93.8 


(0.65) 

(0.66) 

(0.55) 

(0.65) 

(0.63) 

(0.63) 

(0.49) 

(0.62) 

Pi 

93.2 

93.6 

89.6 

93.4 

95 

95.4 

88.6 

95 


(0.66) 

(0.66) 

(0.56) 

(0.65) 

(0.63) 

(0.63) 

(0.49) 

(0.62) 


95 

95 

13 

100 

94.8 

94.4 

75.4 

100 


(0.16) 

(0.16) 

(0.2) 

(0.35) 

(0.07) 

(0.07) 

(0.1) 

(0.32) 


93 

92 

90.6 

95.6 

93.6 

92.6 

88 

95.8 

(0.49) 

(0.48) 

(0.42) 

(0.56) 

(0.45) 

(0.44) 

(0.35) 

(0.53) 

O-uQl 

95.4 

94.2 

93.2 

94.2 

94.2 

91.2 

87.8 

94.8 


(0.99) 

(0.98) 

(0.94) 

(1.01) 

(0.91) 

(0.89) 

(0.74) 

(0.93) 

^ul 

92.6 

92 

90.6 

94.8 

94 

94 

88.8 

97.6 


(0.5) 

(0.49) 

(0.45) 

(0.57) 

(0.45) 

(0.44) 

(0.36) 

(0.53) 


ric and the residual bootstrap behave well for cr^. On the contrary, as shown in Figure 2, for 
the covariance fj^oi the wild bootstrap outperforms the other methods, both for small and bug 
sample sizes. 

Now, we focus on the impact of different choices of HCCME forms and auxiliary distributions 
on the performance of the wild bootstrap. Table 6 reports the results of a simulation study 
that investigates the coverage of bootstrap percentile intervals for different versions of the wild 
bootstrap in the same four scenarios as before. The sample sizes considered here are n = 20 
and J = 40. The results show clearly that the best behaved version of the wild bootstrap in all 
the scenarios is the one with the auxiliary distribution Fi (Eq. 4) and HC 3 (Eq. 7) as HCCME. 
These results are somehow in contrasts with those of Davidson and Elachaire (2008) which show 
that for a classic regression model the distribution F 2 is always the best choice. 


5 Conclusions 


In this paper we have investigated the performance of three well established bootstrap schemes 
for multilevel models: the parametric bootstrap, the residual bootstrap and the cases bootstrap. 
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Table 3: Scenario 2: Gaussian and heteroscedastic lev-1 errors. Coverage (%) and average length 
(in parenthesis) for 95% bootstrap confidence intervals. 



Parametric 

Residual 

Cases 

Wild 

Parametric 

Residual 

Cases 

Wild 


n = 10, 

J = 20, 

N = 200 


n = 20, 

J = 40, 

N = 800 


/3o 

94.8 

94.6 

88.2 

93.8 

92.2 

92.2 

83.8 

91.2 


(1.32) 

(1.32) 

(1.1) 

(1.27) 

(0.89) 

(0.89) 

(0.71) 

(0.87) 

/3i 

94.4 

94.2 

91.2 

93.8 

94.2 

94.2 

90.2 

93.8 


(1.33) 

(1.32) 

(1.14) 

(1.36) 

(0.89) 

(0.89) 

(0.72) 

(0.92) 


48.2 

65.8 

47.6 

89.4 

50 

69 

49.2 

98.6 


(0.31) 

(0.31) 

(0.39) 

(0.73) 

(0.15) 

(0.15) 

(0.19) 

(0.49) 


92.4 

91.6 

90.8 

92.6 

91.4 

88.4 

85.4 

93 

(1.01) 

(0.96) 

(0.88) 

(0.99) 

(0.65) 

(0.63) 

(0.53) 

(0.72) 

(^uOl 

98 

97.2 

93.2 

91.4 

96.4 

95.8 

90.8 

92.6 


(2.09) 

(1.97) 

(1.96) 

(2.05) 

(1.32) 

(1.27) 

(1.12) 

(1.34) 


95.6 

93 

94.6 

95.4 

95 

92.4 

90.6 

96.4 


(1.03) 

(0.99) 

(0.95) 

(1.06) 

(0.66) 

(0.63) 

(0.54) 

(0.76) 


n = 10, 

J = 80, 

N = 800 


n = 40, 

J = 80, 

N = 3200 


00 

95.4 

95.6 

88.8 

95.8 

93.8 

94 

86.6 

94 


(0.65) 

(0.66) 

(0.55) 

(0.64) 

(0.63) 

(0.63) 

(0.49) 

(0.62) 

pi 

95.8 

96 

92.2 

95.2 

95.4 

95 

88.4 

94.8 


(0.66) 

(0.66) 

(0.56) 

(0.69) 

(0.63) 

(0.63) 

(0.49) 

(0.63) 


20.6 

37.6 

5.4 

77.4 

45.2 

72.6 

50.8 

100 


(0.16) 

(0.16) 

(0.2) 

(0.38) 

(0.07) 

(0.07) 

(0.1) 

(0.33) 


94.6 

93.2 

92.4 

96.8 

94 

93 

85.6 

96.6 

(0.49) 

(0.48) 

(0.42) 

(0.56) 

(0.45) 

(0.44) 

(0.35) 

(0.53) 

O-uQl 

95.2 

95.6 

93.6 

94.8 

93.8 

91.4 

90.2 

95.6 


(0.99) 

(0.98) 

(0.94) 

(1.05) 

(0.91) 

(0.89) 

(0.74) 

(0.95) 


90.2 

89.4 

73.8 

96.2 

97 

96.2 

91.2 

99 


(0.5) 

(0.49) 

(0.45) 

(0.59) 

(0.45) 

(0.44) 

(0.36) 

(0.54) 


Table 4: Scenario 3: Xi ~ 1 homoscedastic lev-1 errors. Coverage (%) and average length 
(in parenthesis) for 95% bootstrap confidence intervals. 



Parametric 

Residual 

Cases 

Wild 

Parametric 

Residual 

Cases 

Wild 


n = 10, 

J = 20, 

N = 200 


n = 20, 

J = 40, 

N = 800 


Po 

89.4 

89.8 

86.6 

89.6 

92.2 

92.4 

85 

91.6 


(1.26) 

(1.25) 

(1.04) 

(1.19) 

(0.88) 

(0.87) 

(0.69) 

(0.85) 

Pi 

89.6 

89.8 

88.6 

90 

91 

91.6 

84.6 

91.8 


(1.26) 

(1.25) 

(1.08) 

(1.19) 

(0.87) 

(0.87) 

(0.69) 

(0.84) 


61.2 

84.6 

81.4 

96.8 

54.8 

91.4 

90.6 

99.6 


(0.31) 

(0.57) 

(0.77) 

(0.85) 

(0.15) 

(0.33) 

(0.46) 

(0.56) 


60.6 

70.8 

74.8 

65.2 

56.4 

75.2 

69 

73 

(0.98) 

(1.33) 

(1.21) 

(1.11) 

(0.65) 

(1.08) 

(0.88) 

(0.95) 

o-uOl 

84.2 

82 

80.6 

66.4 

85.4 

81.8 

78 

78 


(1.94) 

(2.02) 

(2.19) 

(2.01) 

(1.27) 

(1.49) 

(1.28) 

(1.44) 


61.8 

68.2 

79.6 

66.2 

59.6 

74.8 

71.6 

74.2 


(0.99) 

(1.25) 

(1.3) 

(1.12) 

(0.64) 

(1.01) 

(0.87) 

(0.93) 


n = 10, 

J = 80, 

N = 800 


n = 40, 

J = 80, 

N = 3200 


00 

94.6 

95.2 

91.4 

95.2 

93.2 

94.2 

84.4 

93 


(0.64) 

(0.64) 

(0.53) 

(0.63) 

(0.61) 

(0.61) 

(0.47) 

(0.6) 

Pi 

94.2 

95.2 

89.4 

94.6 

92.2 

93.6 

85 

93.8 


(0.65) 

(0.65) 

(0.55) 

(0.64) 

(0.61) 

(0.61) 

(0.47) 

(0.6) 


57.4 

89.6 

71 

98.4 

55.8 

94.4 

95.8 

100 


(0.15) 

(0.32) 

(0.44) 

(0.47) 

(0.07) 

(0.18) 

(0.25) 

(0.36) 

^lo 

62.8 

84 

87 

84.6 

55.2 

83 

73.6 

83.4 

(0.48) 

(0.85) 

(0.72) 

(0.84) 

(0.44) 

(0.88) 

(0.67) 

(0.82) 

O-uOl 

7.6 

82.4 

85.6 

54.2 

76 

76.8 

72.4 

77.2 


(0.5) 

(1.19) 

(1.16) 

(0.51) 

(0.87) 

(1.13) 

(0.91) 

(1.12) 


59.4 

79 

87.8 

82.6 

54.4 

82.2 

71 

82.8 


(0.49) 

(0.83) 

(0.76) 

(0.84) 

(0.44) 

(0.83) 

(0.66) 

(0.8) 
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Table 5: Scenario 4: Xi ~ 1 heteroscedastic lev-1 errors. Coverage (%) and average length 
(in parenthesis) for 95% bootstrap confidence intervals. 



Parametric 

Residual 

Cases 

Wild 

Parametric 

Residual 

Cases 

Wild 


n = 10, 

J = 20, 

N = 200 


n = 20, 

J = 40, 

N = 800 


/3o 

90 

89.6 

84.8 

89 

93 

93 

84 

92.6 


(1.26) 

(1.25) 

(1.04) 

(1.19) 

(0.88) 

(0.87) 

(0.69) 

(0.84) 

/5i 

90.6 

91.8 

89.2 

90.6 

91.6 

91.2 

87 

91.4 


(1.26) 

(1.25) 

(1.08) 

(1.27) 

(0.87) 

(0.87) 

(0.69) 

(0.88) 

O-e 

33.8 

62.6 

60.8 

78.4 

30.6 

72.8 

73.2 

90 


(0.31) 

(0.57) 

(0.77) 

(0.87) 

(0.15) 

(0.33) 

(0.46) 

(0.63) 

O 

CS S 

b 

63.2 

70.4 

74 

65.6 

54.6 

76.2 

68.8 

74.6 

(0.98) 

(1.33) 

(1.21) 

(1.1) 

(0.65) 

(1.08) 

(0.88) 

(0.95) 

<yuoi 

84.4 

80.8 

81.2 

66.8 

88 

83.4 

81.2 

78.6 


(1.94) 

(2.02) 

(2.19) 

(2.09) 

(1.27) 

(1.49) 

(1.28) 

(1.48) 

^ul 

69 

76.2 

86.6 

72.8 

65.4 

78.4 

81.4 

78.4 


(0.99) 

(1.25) 

(1.3) 

(1.13) 

(0.64) 

(1.01) 

(0.87) 

(0.94) 


n = 10, 

J = 80, 

N = 800 


n = 40, 

J = 80, 

N = 3200 


i^o 

93 

93.4 

89.6 

92.6 

92.8 

93 

83.6 

93 


(0.64) 

(0.64) 

(0.53) 

(0.62) 

(0.61) 

(0.61) 

(0.47) 

(0.6) 

pi 

94.8 

94.2 

91.4 

94.4 

93.6 

93.4 

87.4 

93.2 


(0.65) 

(0.65) 

(0.55) 

(0.67) 

(0.61) 

(0.61) 

(0.47) 

(0.61) 

O-e 

18.6 

56.2 

40.8 

74.8 

27.6 

84 

84 

95.4 


(0.15) 

(0.32) 

(0.44) 

(0.53) 

(0.07) 

(0.18) 

(0.25) 

(0.41) 


62.2 

85 

83.2 

84.8 

56.6 

83.8 

73.2 

82.4 

(0.48) 

(0.85) 

(0.72) 

(0.84) 

(0.44) 

(0.88) 

(0.67) 

(0.82) 

O'uQl 

2.8 

83 

85.2 

44.6 

76.6 

77 

75.8 

79 


(0.5) 

(1.19) 

(1.16) 

(0.5) 

(0.87) 

(1.13) 

(0.91) 

(1.14) 

O'Sl 

69.2 

90.4 

91.6 

89.8 

60.4 

85 

79.2 

84.8 


(0.49) 

(0.83) 

(0.76) 

(0.83) 

(0.44) 

(0.83) 

(0.66) 

(0.8) 


Table 6: Simulation study with different versions of the wild bootstrap (re = 20 and J = 40). 



HCz 

Fi F 2 

HC3 

Fi F 2 

HC2 

Fi F 2 

HC3 

Fi F 2 


Homoscedastic lev-1 

errors 

Heteroscedastic lev-1 

errors 


/^o 

93 

93.2 

93 

93.2 

91.2 

92 

91.2 

92 



(0.88) 

(0.87) 

(0.88) 

(0.87) 

(0.88) 

(0.87) 

(0.88) 

(0.87) 


Pi 

93.4 

93.2 

93.4 

93.4 

93.8 

94.2 

94 

94.2 



(0.88) 

(0.88) 

(0.89) 

(0.88) 

(0.88) 

(0.88) 

(0.89) 

(0.88) 



100 

0 

100 

0.6 

98.6 

0.2 

98.6 

0 



(0.46) 

(0.001) 

(0.46) 

(0.001) 

(0.46) 

(0.001) 

(0.46) 

(0.001) 


^lo 

94.4 

19.6 

94.4 

19.8 

93 

21.6 

93 

22 

O 

(0.73) 

(0.09) 

(0.73) 

(0.09) 

(0.73) 

(0.09) 

(0.73) 

(0.09) 


<yuQi 

92.8 

23.6 

92.8 

23.4 

92.6 

27 

92.8 

27 



(1.3) 

(0.23) 

(1.3) 

(0.23) 

(1.3) 

(0.23) 

(1.3) 

(0.23) 


O'Sl 

93.6 

24.2 

94 

24.8 

96.4 

21.2 

96.8 

21 



(0.73) 

(0.1) 

(0.74) 

(0.1) 

(0.73) 

(0.1) 

(0.74) 

(0.1) 


/3o 

91.6 

91.8 

91.6 

91.8 

92.6 

91 

92.8 

91 



(0.85) 

(0.84) 

(0.85) 

(0.84) 

(0.85) 

(0.84) 

(0.85) 

(0.84) 


Pi 

91.8 

89.2 

91.8 

89.2 

91.4 

90.6 

91.4 

90.8 



(0.84) 

(0.84) 

(0.84) 

(0.84) 

(0.84) 

(0.84) 

(0.84) 

(0.84) 



99.6 

0.2 

99.6 

0.2 

90 

0 

90.6 

0 

1 


(0.56) 

(0.001) 

(0.56) 

(0.001) 

(0.56) 

(0.001) 

(0.56) 

(0.001) 

(N--H 


73 

8 

73.2 

7.4 

74.6 

7 

74.4 

7 


(0.95) 

(0.09) 

(0.95) 

(0.09) 

(0.95) 

(0.09) 

(0.95) 

(0.09) 


o-uOl 

78 

18.6 

78.2 

19 

78.6 

19.4 

78.6 

19.4 



(1.44) 

(0.21) 

(1.45) 

(0.21) 

(1.44) 

(0.21) 

(1.45) 

(0.21) 



74.2 

8 

74.8 

7.8 

78.4 

9.8 

78.8 

9.6 



(0.93) 

(0.09) 

(0.94) 

(0.09) 

(0.93) 

(0.09) 

(0.94) 

(0.09) 
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Figure 1: Distributions of the bootstrap replications for the within-group variance cr^ for two 
samples with heteroscedastic Gaussian errors: (left) small sample size (n = 10, J = 20); (right) 
big sample size (n = 80, J = 80). 




O'uOl O'uOl 

Figure 2: Distributions of the bootstrap replications for the covariance between the random slope 
and the random intercept cr„oi for two samples with heteroscedastic Gaussian errors: (left) small 
sample size (n = 10, J = 20); (right) big sample size (n = 80, J = 80). 
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Also, we have introduced a modified version of the wild bootstrap procedure which is partic¬ 
ularly suitable to hierarchical data. We have assessed the hnite size performances of the four 
bootstrap schemes by means of a Monte Carlo study where we have varied sample size, error 
distribution and error variance. Both the cases bootstrap and the wild bootstrap do not require 
homoscedasticity and do not make distributional assumptions on the error processes. Still, the 
performance of the two schemes is very different in terms of coverage and length of conhdence 
intervals. In fact, except for some specihc instances, the cases bootstrap has the worst perfor¬ 
mance of all the four bootstrap schemes, no matter the sample size or the kind of errors. On the 
contrary, for big sample sizes the wild bootstrap outperforms the three competitors in all the 
scenarios considered including the Gaussian homoscedastic case. This is especially true as far 
as estimation of variance components is concerned. In fact, in case of estimation of regression 
coefficients, both the parametric and the residual bootstrap behave quite well and are robust 
with respect to non-Gaussianity and heteroscedasticity. The estimation of level-I variance cr^, 
instead, is more problematic: the parametric bootstrap performs very poorly when the assump¬ 
tions of normality and homoscedasticity are violated; in these cases, the residual bootstrap, 
behaves better than the parametric bootstrap but it is still outperformed by the wild bootstrap 
in all the scenarios, with the most dramatic worsening in the heteroscedastic case. 

In conclusion, we advocate the use of the proposed version of the wild bootstrap in a mul¬ 
tilevel framework especially when the validity of the assumptions underlying the model is ques¬ 
tionable, as well as when the sample is sufficiently large. 
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